Evaluation of name prefix and suffix during a search

ABSTRACT

A method, system, and computer readable medium for evaluating a string input for a name, categorizing the words within the string into different fields, and using the relations to make better or exact hits during a name search possible.

BACKGROUND

Searching through databases is becoming more complex as storage capacityis increased and more information can be stored. With databases thatcontain large amounts of information, particularly about people, theamount of retrievable names can be significant. Furthermore, many peoplehave variations on names, particularly in affixes, that make their namesdifficult to search. Many people also have similar names or similarsurnames. Searching for a common name may lead to a large processingtime. In addition, a result list returned to a user may be too large,may not contain variations of certain parts of names, may containredundant names, or may be inaccurate. The large amount of informationin a person's name, and the various parts to a person's name, arecurrently not utilized in helping to narrow the field of search for aname. At the same time, requiring a user to input parts of a person'sname into separate search fields respectively corresponding to thoseparts may be confusing to a user and requires more time than enteringthe complete name in a single field. There is also a possibility ofhuman error if part of a name is placed in the wrong field. Thus, amethod is needed to increase the number of variables that is searchablewithout increasing the complexity of the user input.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a possible configuration of a system capable of usingan embodiment of the invention.

FIG. 2 a illustrates an example lookup table.

FIG. 2 b illustrates an example list of the fields in a customizingtable.

FIG. 2 c illustrates an example description of the fields in acustomizing table, including the various types of affixes.

FIG. 2 d illustrates an example customizing table.

FIG. 3 illustrates an example user input box in a user interface.

FIG. 4 illustrates the example logic that may be performed in order todetermine the parts of the name input in the example input box of FIG. 3using the example lookup table of FIG. 2 a and the example customizingtable in FIG. 2 d.

FIG. 5 illustrates the general logic that may be used by an embodimentof the system to return a result list given a string, in this example,the string is a name.

DETAILED DESCRIPTION

In order to decrease the number of entries in a result list from adatabase search, differentiation between name parts is utilized to getmore exact hits. In order to not confuse a user and allow flexibility,rather than presenting a user with multiple fields representing parts ofnames, a user can enter the name once, and an embodiment parses thename. An embodiment may take permutations of the name and places theminto a lookup table. As the embodiment traverses the index of the lookuptable, each permutation of the name is checked against a customizingtable, which contains a list of all possible affixes in the namedatabase. An affix may be any subpart of a name, e.g., a suffix, prefix,etc. Different parts of the name are found by iterating through a lookuptable and determining affix type by searching through a customizingtable. One advantage of the procedure performed by the embodiment is toidentify all affixes (such as “Dr.”, “Van”, etc.) that are in an inputname string. Once this information on the parts of the name isidentified, it is used to retrieve names (and any other correspondinginformation) from the name database. The result list may then bereturned to the user. This procedure may be used when the entire name isentered into a single field.

FIG. 1 illustrates a possible configuration of a system capable of usingan embodiment of the invention. A user 100 inputs an employee name 104into a user interface 105 on a computing device 101. The computingdevice 101 takes the input and the embodiment processes the information106 and communicates with a server 103 over a communication medium 102to retrieve a result list 108 which is returned to the user 100. Thecomputing device 101 can be any hardware that has processing orcomputational capability, such as a laptop, handheld device, etc. Thecommunication medium 102 can be either intranet or internet and over awireless or wired communication (e.g. ethernet cable). The server 103can hold database information, and one can distribute the functionalmodules of an embodiment across one or more server computers 103 asappropriate.

FIG. 2 a illustrates an example lookup table. The lookup table iscreated from an example input string “Dr. Van Muller Baron”. Allpossible permutations of the name are placed in the lookup table underthe heading “Value.” The numbers in the left-most column indicate theindex number of all the strings, and the right-most column under theheading “Index” represents the index number to place a marker indicatingthe next value to be read if the current “value” in that row is found inthe customizing table. The marker is the placeholder of an embodiment toallow it to keep track of which “value” is being processed. The “0”index value means to “exit” the lookup table search, indicating thatthere are no more values to be processed, or that all the values havebeen processed. However, any value may be used to indicate that thesearch can be exited as long as the number is not an index number in thelookup table. The “Index” number indicating the next number is done bydetermining the index that evaluates a string containing the remainingwords that are left if one of the words is found. For example, if thevalue “Dr.” is found in the customizing table then the correspondingIndex is “5”, meaning that the next value to be processed possiblycontains any permutation of the remaining value of “Van Muller Baron.”

FIG. 2 b illustrates an example list of the fields in a customizingtable. The customizing table contains two fields in this exampleembodiment, either the “ART” field or a “TITLE” field. The descriptionsare also in the figure. The data type indicates the storage type of thefields, and in this example embodiment they are both of type “CHAR”meaning characters. Length in this example would be the number ofcharacters, and in this case the “ART” field is a single character whilethe “TITLE” may be as long as 15 characters. This is adjustabledepending on the information that is found in the customizing table.

FIG. 2 c illustrates an example description of fields that may be in acustomizing table, including the various types of affixes. For example,the single ‘S’ character is a suffix that represents an “academic titleafter name,” such as the “M.D.” for a medical doctor or the “Esq.” foresquire that comes after a lawyer's name. A ‘T’ character represents aprefix indicating that it is an “academic title before name,” such as a“Dr.” string used to indicate those that are medical doctors or doctorsof philosophy (a.k.a. a Ph.D. or philosophiae doctor). A user may changethe types of characters representing various affixes in the descriptionof fields, as well as in the customizing table.

FIG. 2 d illustrates an example customizing table. The customizing tablemay contain not only affixes that are common in names, but alsovariations of spelling for the affixes. For example, the suffix “theSecond” may appear as “II” or “the 2nd” or “the 2^(nd)”. Also, theacademic title “MD” for medical doctor may also exist in the customizingtable as “M.D.”. The title “Ph.D.” may also be listed as “PhD.” Havingvariations on affixes allows for an embodiment to account foralternative (or even commonly mistaken) spellings of affixes. All theaffixes in the example are listed together in a single customizingtable, as shown by the example ART: ‘T’ corresponding to the value “Dr.”200, ‘V’ corresponding to the value “Van” 201, and ‘Z’ corresponding tothe value “Baron” 202. However, a customizing table may also be dividedinto different tables by affix type.

FIG. 3 illustrates an example user input box in a user interface. A userinterface searching for names will have a single input 300, rather thanmultiple inputs. This is because the embodiment will process the name toseparate the name into its various parts and then return the result listbased on its parsing and search of the name. The example in FIG. 3 hasthe name “Dr. Van Muller Baron, Peter Smith.” The embodiment wouldautomatically parse out the First Name, “Peter”, and the Middle Name“Smith” leaving only the various affixes and parts of the Last Name. Thelookup table for the remaining string “Dr. Van Muller Baron” is alreadyin the example lookup table provided in FIG. 2 a.

FIG. 4 illustrates the example logic that may be performed in order todetermine the parts of the name input in the example input box of FIG. 3using the example lookup table of FIG. 2 a and the example customizingtable in FIG. 2 d. Starting at Step 400 the first value is queried, inthis case the value is “Dr. Van Muller Baron.” If “Dr. Van Muller Baron”was in the customizing table 402, then Step 401 would be executedbecause the corresponding index is “0” 401 and the lookup table searchwould exit 408, meaning that the parsing and separation of the name intoparts has been completed. However, in this example, “Dr. Van MullerBaron” is not in the customizing table and thus the next value isqueried 403. If the value at the next index, “Dr. Van Muller”, was inthe customizing table 404, the corresponding index is “10” 405, themarker of the embodiment would be “10” and the value at “10” would bethe next value searched. However, “Dr. Van Muller” is not found in thecustomizing table, thus, the next value is queried 409 and theembodiment would search the customizing table for the value “Dr. Van”410. If the value “Dr. Van” were found in the customizing table 410,then the corresponding index would equal “8” 411, the marker would be at“8”, and the value at “8,” which is “Muller Baron,” would be searched inthe customizing table. In this example, “Dr. Van” is also not in thecustomizing table and the next value is queried 417. The next value“Dr.” is found in the customizing table 418 as an affix of ART ‘T’ 200.Thus the corresponding index is “5” 419 and the marker would be at “5”;however, one can note that even if “Dr.” were not found in thecustomizing table, the next value searched 421 would have been at index“5”.

Since the value “Dr.” has been found, the embodiment now has todetermine whether the rest of the name has any particular affix values.Thus the remaining values are essentially all permutations of theoriginal string “Dr. Van Muller Baron” without the string that wasalready found “Dr.”, leaving all the permutations of the word “VanMuller Baron”. The value “Van Muller Baron” is the value at index “5”and had it been found in the customizing table 420, the correspondingindex and marker would be “0” 401 and the search could exit 408.However, the value “Van Muller Baron” is not in the customizing table,and the next value queried 422 is “Van Muller”, which is also not foundin the customizing table. Had “Van Muller” been found 423 then the indexwould be “10” and the marker would be placed at “10”. However, the value“Van Muller” is not in the customizing table and the next value queried424 is “Van”. The embodiment would query the value “Van” in thecustomizing table 425 and since it exists as the affix of ART ‘V’, theindex would be “8” and the marker would be “8”; however, one can againnote that even if “Van” were not found in the customizing table, thenext value searched 426 would have been at index “8”.

The value at index “8” is “Muller Baron.” If “Muller Baron” were foundin the customizing table 412, the index would be “0” 401 and the searchcould exit 408. However, the value “Muller Baron” is not in thecustomizing table, and the next value queried 413 is “Muller”, which isalso not in the customizing table.

However, one can note that whether “Muller” is found 414 is irrelevantbecause had “Muller” been in the customizing table, the marker wouldhave pointed to index “10” 416. The next value queried 415 is “Baron.”If “Baron” were not in the customizing table 406, as it is the lastentry the next value in the index 407 is automatically index “0” 401,and the search would exist 408. However, “Baron” does exist in thecustomizing table as the affix of ART ‘Z’ and the index marker is now“0” 401 meaning the search exits 408.

FIG. 5 illustrates the general logic that may be used by an embodimentof the system to return a result list given a string, in this example,the string is a name. First, an embodiment would retrieve string from aninput 500, similar to that of FIG. 3. Next, a lookup table would becreated 501, similar to that of FIG. 2 a. Next, an embodiment wouldpoint a marker to the first index value 502 and then perform a recursivesearch step on the lookup table. An embodiment would retrieve a valuefrom the index row to which it is pointing 503. If the current valuewere in a customizing table 504, then the relevant field information ofthat value would be temporarily stored 505, the marker would be giventhe number in the corresponding index. If the index was “0” 507 then theembodiment would exit out of the search. The exit value does notnecessarily have to be a “0”. It need only be a unique identifier thatis not an index number that exists in the lookup table.

If the current value were not in the customizing table, then the markerwould be incremented to the next value 508. If there were not a nextindex value 509 then the lookup table search would exit, otherwise thenext value would be queried 503 and the search would repeat in theremaining values in the lookup table.

Once the lookup table search is complete, the information stored in step505 is used to retrieve names from the database 510. Also, any parts ofa name that were not found in the customizing table may by default belisted in the last name field. Of course, the default information may beadjustable depending on the information found in the database oradjustments by a user.

Since the information stored provides the corresponding fields, eachpart of the name would not have to be searched in all the fields of adatabase table. Thus, an advantage of the embodiment is not only to saveboth the user the hassle of inputting various strings into multiplesearch fields, but also to save the system resources by reducing theamount of fields of a table searched. For example, if the value “Dr.” isknown to be a prefix of ART ‘T’, then only names that contain anacademic title before the name would be potential matches. The “Dr.”value would not be searched in the suffix fields of the database.Furthermore, if the ART ‘V’ value “Van” were determined, then thisinformation would only be used to whittle down the remaining names thatare possible matches. Thus, among names with “Dr.”, the remaining listwould only be names that also contained an aristocratic prefix betweenthe first and second name with the value “Van.” This would continueuntil all the information is matched with the result list. The resultlist is provided to the user and the embodiment exits 510.

An advantage of an embodiment is that with a reduced list of namesfound, a user would make less calls to the database for all theinformation that corresponds to the names returned. Furthermore, theresult list may be returned as a list of full names, or the result listcan return a list broken down by the various parts of the name. Anadvantage of the result list is that the information is expandable sothat if other search engines have input fields that are broken down byparts of names, the user would have this information.

Several embodiments of the present invention are specificallyillustrated and described herein. However, it will be appreciated thatmodifications and variations of the present invention are covered by theabove teachings and within the purview of the appended claims withoutdeparting from the spirit and intended scope of the invention.

1. A method comprising: parsing a first character string taken from atleast one field; creating a lookup table using the first characterstring; iterating through the lookup table; selecting first values froma customizing table on the iteration of the lookup table; selectingsecond values from a database using the first values selected from thecustomizing table; returning a result list comprised of the plurality ofsecond values selected from the database.
 2. A method according to claim1, wherein the first character string is a name.
 3. A method accordingto claim 2, wherein a first name and middle name are parsed out of thefirst character string.
 4. A method according to claim 1, wherein thelookup table contains string values.
 5. A method according to claim 4,wherein the lookup table contains index values.
 6. A method according toclaim 5, wherein the string values are permutations of words within thefirst character string.
 7. A method according to claim 6, wherein theindex values are numbers used to determine the next string value tosearch in the lookup table if a current string value is not found in thecustomizing table.
 8. A method according to claim 1, wherein the firstvalues are parts of a name.
 9. A method according to claim 1, whereinthe first values are associated with words within the first characterstring.
 10. A method according to claim 1, wherein the each first valueis an affix.
 11. A method according to claim 1, wherein the words in thefirst character string that are not found in the customizing table arelast names.
 12. A method according to claim 1, wherein the second valuesare names.
 13. A method according to claim 12, wherein the second valuesare information corresponding to the names.
 14. A system comprising: acomputing device for: parsing a first character string taken from atleast one field, creating a lookup table using the first characterstring; iterating through the lookup table; selecting first values froma customizing table on the iteration of the lookup table; and a databasefor providing second values using the first values selected from thecustomizing table; and a display for returning a result list comprisedof the plurality of second values selected from the database.
 15. Asystem according to claim 14, wherein the first character string is aname.
 16. A system according to claim 15, wherein a first name andmiddle name are parsed out of the first character string.
 17. A systemaccording to claim 14, wherein the lookup table contains string values.18. A system according to claim 17, wherein the lookup table containsindex values.
 19. A system according to claim 18, wherein the stringvalues are permutations of words within the first character string. 20.A system according to claim 19, wherein the index values are numbersused to determine the next string value to search in the lookup table ifa current string value is not found in the customizing table.
 21. Asystem according to claim 14, wherein the first values are parts of aname.
 22. A system according to claim 14, wherein the first values areassociated with words within the first character string.
 23. A systemaccording to claim 14, wherein the each first value is an affix.
 24. Asystem according to claim 14, wherein the words in the first characterstring that are not found in the customizing table are last names.
 25. Asystem according to claim 14, wherein the second values are names.
 26. Asystem according to claim 25, wherein the second values are informationcorresponding to the names.
 27. A computer readable medium containinginstructions that when executed result in a performance of a methodcomprising: parsing a first character string taken from at least onefield; creating a lookup table using the first character string;iterating through the lookup table; selecting first values from acustomizing table on the iteration of the lookup table; selecting secondvalues from a database using the first values selected from thecustomizing table; returning a result list comprised of the plurality ofsecond values selected from the database.
 28. A computer readable mediumaccording to claim 27, wherein the first character string is a name. 29.A computer readable medium according to claim 28, wherein a first nameand middle name are parsed out of the first character string.
 30. Acomputer readable medium according to claim 27, wherein the lookup tablecontains string values.
 31. A computer readable medium according toclaim 30, wherein the lookup table contains index values.
 32. A computerreadable medium according to claim 31, wherein the string values arepermutations of words within the first character string.
 33. A computerreadable medium according to claim 32, wherein the index values arenumbers used to determine the next string value to search in the lookuptable if a current string value is not found in the customizing table.34. A computer readable medium according to claim 27, wherein the firstvalues are parts of a name.
 35. A computer readable medium according toclaim 27, wherein the first values are associated with words within thefirst character string.
 36. A computer readable medium according toclaim 27, wherein the each first value is an affix.
 37. A computerreadable medium according to claim 27, wherein the words in the firstcharacter string that are not found in the customizing table are lastnames.
 38. A computer readable medium according to claim 27, wherein thesecond values are names.
 39. A computer readable medium according toclaim 38, wherein the second values are information corresponding to thenames.